Keywords in Context (Using n-grams) with Python
نویسندگان
چکیده
منابع مشابه
Protein classification using modified n-grams and skip-grams.
Motivation Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce...
متن کاملForecasting Conflicts Using N-Grams Models
Analyzing international political behavior based on similar precedent circumstances is one of the basic techniques that policymakers use to monitor and assess current situations. Our goal is to investigate how to analyze geopolitical conflicts as sequences of events and to determine what probabilistic models are suitable to perform these analyses. In this paper, we evaluate the performance of N...
متن کاملEmbedded Malware Detection Using Markov n-Grams
Embedded malware is a recently discovered security threat that allows malcode to be hidden inside a benign file. It has been shown that embedded malware is not detected by commercial antivirus software even when the malware signature is present in the antivirus database. In this paper, we present a novel anomaly detection scheme to detect embedded malware. We first analyze byte sequences in ben...
متن کاملSpam Detection Using Character N-Grams
This paper presents a content-based approach to spam detection based on low-level information. Instead of the traditional 'bag of words' representation, we use a 'bag of character n-grams' representation which avoids the sparse data problem that arises in n-grams on the word-level. Moreover, it is language-independent and does not require any lemmatizer or 'deep' text preprocessing. Based on ex...
متن کاملComparing Medline citations using modified N-grams
OBJECTIVE We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. MATERIALS AND METHODS Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Programming Historian
سال: 2012
ISSN: 2397-2068
DOI: 10.46430/phen0010